Training OCR Systems Using Variants of Ideal Images
نویسنده
چکیده
This paper shows that the learning stage of highperformance character classifiers can be achieved using only ideal images and/or simple variants of them. We are interested in 94 character classes of the ASCII character set. We use a software tool to generate binary ideal images of these characters. Ideal images are supported with simple variants derived from them to represent character classes. Learning in a nearest neighbor classifier (type-A) is performed using ideal images of characters and their variants. Pixel intensity values are used as features. To judge on the effectiveness of this classifier, another novel nearest neighbor classifier (type-B) is built and trained using real images. These classifiers are tested using a real dataset. The overall recognition, error, and rejection rates of a threevariants type-A classifier are 98.5%, 1.5%, and 0.0%, respectively. The recognition rate of the type-B classifier exceeds that of type-A classifier that uses three variants by only 1.2%. Using other kinds of classifiers and using multiple classifier technology is expected to produce much impressive results for the type-A classifier. The type-A classifier that uses three variants requires 57 ms to recognize a character. The type-B classifier requires less than one half of the time required by type-A
منابع مشابه
Prediction of OCR accuracy using a Neural Network
A method for predicting the accuracy achieved by an OCR system on an input image is presented. It is assumed that there is an ideal prediction function. A neural network is trained to estimate the unknown ideal function. In this project, multilayer perceptrons were trained to predict the character accuracy performance of two OCR systems using the backpropagation training method. The results sho...
متن کاملFONT DISCRIMINATIO USING FRACTAL DIMENSIONS
One of the related problems of OCR systems is discrimination of fonts in machine printed document images. This task improves performance of general OCR systems. Proposed methods in this paper are based on various fractal dimensions for font discrimination. First, some predefined fractal dimensions were combined with directional methods to enhance font differentiation. Then, a novel fractal dime...
متن کاملAn Automatic Closed-loop Methodology for Generating Character Groundtruth for Scanned Documents an Automatic Closed-loop Methodology for Generating Character Groundtruth for Scanned Documents an Automatic Closed-loop Methodology for Generating Character Groundtruth for Scanned Documents
Character groundtruth for real, scanned document images is crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not practical because (i) accuracy in delineating groundtruth character bounding boxes is not high enoug...
متن کاملAn Automatic Closed - Loop Methodology forGenerating Character
Character groundtruth for real, scanned document images is extremely useful for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not possible because (i) accuracy in delineating groundtruth character bounding boxes is not hi...
متن کاملAn Automatic Closed-Loop Methodology for Generating Character Groundtruth for Scanned Documents
Character groundtruth for real, scanned document images is crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not practical because (i) accuracy in delineating groundtruth character bounding boxes is not high enoug...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005